在这项工作中,我们解决了为野外任何演讲者发出静音唇部视频演讲的问题。与以前的作品形成鲜明对比的是,我们的方法(i)不仅限于固定数量的扬声器,(ii)并未明确对域或词汇构成约束,并且(iii)涉及在野外记录的视频,反对实验室环境。该任务提出了许多挑战,关键是,所需的目标语音的许多功能(例如语音,音调和语言内容)不能完全从无声的面部视频中推断出来。为了处理这些随机变化,我们提出了一种新的VAE-GAN结构,该结构学会了将唇部和语音序列关联到变化中。在指导培训过程的多个强大的歧视者的帮助下,我们的发电机学会了以任何人的唇部运动中的任何声音综合语音序列。多个数据集上的广泛实验表明,我们的优于所有基线的差距很大。此外,我们的网络可以在特定身份的视频上进行微调,以实现与单扬声器模型相当的性能,该模型接受了$ 4 \ times $ $数据的培训。我们进行了大量的消融研究,以分析我们体系结构不同模块的效果。我们还提供了一个演示视频,该视频与我们的网站上的代码和经过训练的模型一起展示了几个定性结果: -合成}}
translated by 谷歌翻译
最近,手语研究人员已转向手语解释的电视广播,包括(i)连续签名的视频和(ii)与音频内容相对应的字幕,作为易于使用和大规模的培训数据来源。此类数据可用性的一个关键挑战是缺乏标志注释。利用这种弱对准数据的先前工作仅发现字幕中的关键字与单个符号之间的稀疏对应关系。在这项工作中,我们提出了一个简单,可扩展的框架,以极大地增加自动注释的密度。我们的贡献如下:(1)我们通过使用同义词和字幕签名对齐来显着改善先前的注释方法; (2)我们将标志识别模型中的伪标签的价值作为标志发现的方式; (3)我们提出了一种新的方法,以增加基于内域示例的已知和未知类别的注释; (4)在Bobsl BSL手语语料库上,我们将自信自动注释的数量从670K增加到5M。我们将这些注释公开用于支持手语研究社区。
translated by 谷歌翻译
本文的目标是学习强烈的唇读模型,可以在静音视频中识别语音。大多数事先有效地处理开放式视觉语音识别问题,通过调整在漫步的可视化功能之上的现有自动语音识别技术。相反,在本文中,我们专注于唇读中遇到的独特挑战,并提出量身定制的解决方案。为此,我们提出以下贡献:(1)我们提出了一种基于关注的汇集机制来聚合视觉语音表示; (2)我们首次使用Sub-Word单元进行唇读,并显示这使我们能够更好地模拟任务的含糊不限; (3)我们提出了一种用于视觉语音检测(VSD)的模型,在唇读网络顶部培训。在上文之后,我们在公共数据集训练时获得最先进的LRS2和LRS3基准,甚至通过使用更少的数据量级验证的大规模工业数据集培训的型号。我们最好的模型在LRS2数据集中实现了22.6%的字错误率,这是唇读模型前所未有的性能,显着降低了唇读和自动语音识别之间的性能差距。此外,在AVA-ActiveSpeaker基准测试中,我们的VSD模型超越了所有可视基线,甚至优于最近的几种视听方法。
translated by 谷歌翻译
Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.
translated by 谷歌翻译
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.
translated by 谷歌翻译
Machine Learning models capable of handling the large datasets collected in the financial world can often become black boxes expensive to run. The quantum computing paradigm suggests new optimization techniques, that combined with classical algorithms, may deliver competitive, faster and more interpretable models. In this work we propose a quantum-enhanced machine learning solution for the prediction of credit rating downgrades, also known as fallen-angels forecasting in the financial risk management field. We implement this solution on a neutral atom Quantum Processing Unit with up to 60 qubits on a real-life dataset. We report competitive performances against the state-of-the-art Random Forest benchmark whilst our model achieves better interpretability and comparable training times. We examine how to improve performance in the near-term validating our ideas with Tensor Networks-based numerical simulations.
translated by 谷歌翻译
诸如DALL-E 2之类的生成模型可以代表放射学中人工智能研究的图像生成,增强和操纵的有希望的未来工具,前提是这些模型具有足够的医疗领域知识。在这里,我们证明DALL-E 2在零拍的文本到图像生成方面,学习了具有有希望的功能的X射线图像的相关表示,将图像的延续超出其原始边界或删除元素,尽管病理产生或CT,MRI和超声图像仍然受到限制。因此,即使事先需要对这些模型进行进一步的微调和适应,也需要使用生成模型来增强和生成放射学数据似乎是可行的。
translated by 谷歌翻译
互动主义模型引入了一种动态的语言,交流和认知方法。在这项工作中,我们在对话对话系统(SDS)的对话建模的背景下探讨了这一基本理论。为了扩展这样的理论框架,我们提出了一组设计原则,这些设计原则遵守中央心理语言和交流理论,以实现SDS中的互动主义。通过这些,关键思想可以构成我们提出的设计原则的基础。
translated by 谷歌翻译
开发有效的自动分类器将真实来源与工件分开,对于宽场光学调查的瞬时随访至关重要。在图像差异过程之后,从减法伪像的瞬态检测鉴定是此类分类器的关键步骤,称为真实 - 博格斯分类问题。我们将自我监督的机器学习模型,深入的自组织地图(DESOM)应用于这个“真实的模拟”分类问题。 DESOM结合了自动编码器和一个自组织图以执行聚类,以根据其维度降低的表示形式来区分真实和虚假的检测。我们使用32x32归一化检测缩略图作为底部的输入。我们展示了不同的模型训练方法,并发现我们的最佳DESOM分类器显示出6.6%的检测率,假阳性率为1.5%。 Desom提供了一种更细微的方法来微调决策边界,以确定与其他类型的分类器(例如在神经网络或决策树上构建的)结合使用时可能进行的实际检测。我们还讨论了DESOM及其局限性的其他潜在用法。
translated by 谷歌翻译
现有的数据驱动和反馈流量控制策略不考虑实时数据测量的异质性。此外,对于缺乏数据效率,传统的加固学习方法(RL)方法通常会缓慢收敛。此外,常规的最佳外围控制方案需要对系统动力学的精确了解,因此对内源性不确定性会很脆弱。为了应对这些挑战,这项工作提出了一种基于不可或缺的增强学习(IRL)的方法来学习宏观交通动态,以进行自适应最佳周边控制。这项工作为运输文献做出了以下主要贡献:(a)开发连续的时间控制,并具有离散增益更新以适应离散时间传感器数据。 (b)为了降低采样复杂性并更有效地使用可用数据,将体验重播(ER)技术引入IRL算法。 (c)所提出的方法以“无模型”方式放松模型校准的要求,该方式可以稳健地进行建模不确定性,并通过数据驱动的RL算法增强实时性能。 (d)通过Lyapunov理论证明了基于IRL的算法和受控交通动力学的稳定性的收敛性。最佳控制定律被参数化,然后通过神经网络(NN)近似,从而缓解计算复杂性。在不需要模型线性化的同时,考虑了状态和输入约束。提出了数值示例和仿真实验,以验证所提出方法的有效性和效率。
translated by 谷歌翻译